home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
CD ROM Paradise Collection 4
/
CD ROM Paradise Collection 4 1995 Nov.iso
/
program
/
4cmp22s.zip
/
TR.4TH
< prev
next >
Wrap
Text File
|
1994-10-30
|
12KB
|
403 lines
\ TR PROGRAM
\ TRANSLATES SOURCE FILE INTO DESTINATION FILE.
\ WORKS LIKE UNIX tr WITH FOLLOWING EXCEPTIONS:
\ 1. -A OPTION NEEDED FOR ASCII FILES.
\ 2. HANDLES NULL CHARACTERS!
\ This program Copyright (C) 1985 by Thomas Almy.
\ Permission is granted to registered users of ForthCMP to sell or distribute
\ computer programs incorporating the compiled contents of this file.
0 [IF]
( note -- program has been modified since writing this paper, and
this paper benchmarks the original CP/M version of the compiler)
TRANSLATE PROGRAM
by Tom Almy
August 1985
PROGRAM DESCRIPTION
This program was designed to mimic the functionality of the "tr"
program provided on UNIX (tm Bell Labs) systems. While written to be
compiled with the author's ForthCMP Forth Compiler, it can be
utilized on any 83 Standard system by providing an appropriate file
system interface.
TR is used to make one or more single character substitutions in
a file. ForthCMP's FILTER file interface allows specifying an input
file and an optional output file (if no output file is specified,
output goes to the display). The file name(s) are followed by an
optional option specification and one or two character specifying
strings.
Characters in the strings may be any character except "\"
(backslash) or "-" (hyphen). Any of the 256 possible character codes
can be specified by backslash followed by one, two, or three octal
digits. Backslash followed by a lower case character becomes an
upper-case character (done to allow putting the string on a CP/M
command line). Backslash followed by any other character is that
character, so "-" can be represented as "\-" and "\" can be
represented as "\\". A range of characters can be represented by the
first character followed by a hyphen followed by the last character.
If no options are specified, a translation occurs in which each
character which is in the first specification string is replaced with
the character in the same position in the second specification
string. If the second string is shorter than the first then the
string is extended by appending copies of its last character.
The option specifcation can contain any or all of the following
characters:
A ASCII mode: On input CR is deleted (leaving just LF of CR-LF pairs)
and on output LF is replaced with CR-LF pairs. This allows
translating to or from CR-LF pairs. Additionally, Control-Z denotes
end of file.
C Complement first string: The first string is replaced with a new
string consisting of the characters in the range 0 through 255
not in the first string. This string is sorted.
D Delete instead of translate: No translation takes place;
instead any characters in the first specification string are deleted.
S Squeeze output: Sequential occurrences in the character stream
(after translation/deletion) of two or more identical
characters in the second specification string are squeezed to a
single occurrence.
Example commands:
Options String1 String2 Function
a-z A-Z Upcase file
a-zA-Z A-Za-z Swap case file
AS \12 \12 Delete blank lines
ACS !-~ \12 Put all words on separate lines
AS \40 \12 Put all words on separate lines
ACDS A-Za-z\12\40 Delete all non alphabetics, except
spaces and newlines.
\200-\377 \0-\177 Clear parity bits.
PERFORMANCE
I compared the performance of the Forth program, using the
ForthCMP compiler, with that of C, using the MANX (AZTEC) compiler.
The system used was a LOBO MAX-80, which has a 5-Mhz Z-80 processor,
1.2 MByte 8" floppy drives, and runs CP/M+.
Characteristic Forth C
Source file lines (not blank) 163 139
Compilation time Compile Step 44 44 seconds
Assemble Step none 32
Link Step none 38
TOTAL 44 114
COM file size 3584 9984 bytes
Test case execution time 21 138 seconds
The test case involved upcasing a 14k byte file. The PIP
program (which is written in assembly language) took 16 seconds.
The ForthCMP compiler compiles and links in a single step. 5
seconds was spent producing a load map (not done in the C example),
so the ForthCMP compilation time could really be considered to be 39
seconds.
READING THE LISTING
First, ignore the INCLUDE, ROMABLE, and IN/OUT commands, as they
are directives for the compiler. The definition of CARRAY is "CREATE
ALLOT DOES> +". The definition of C<- is "SWAP C!". The non-standard
words ?DO " <= >= ON OFF ASCII CONTROL SKIP and SCAN and Eaker' case
statement (CASE OF ENDOF ENDCASE) have their usual definitions.
The file interface redefines KEY and EXPECT to read from the
input file. KEY returns -1 on end of file; otherwise it returns the
next character as an integer in the range 0 through 255. Because the
new EXPECT does not echo and has no editing, OLD- EXPECT (which is
system dependent) had to be provided. Output (EMIT and words which
call it) is rewritten to send output to the output file when FILTER
is executed, or to the display when CONSOLE is executed.
SETFILES is used to initialize the input and output files, and
returns TRUE if successful. The double variable OPTIONSTRING is set
to contain a pointer to and length of the command tail (that part
excluding the file specifications). ENDFILES does any necessary file
closing.
[THEN]
\ Modified for new filter August, 1986
\ Modified for newest DOS interface 12/91
100 MSDOS
HEX 4000 DECIMAL CONSTANT BUFSIZ
INCLUDE DOS1
256 CARRAY TRTABLE \ translation table
256 CARRAY SQTABLE \ squeeze duplicates table
CREATE INLIST 512 ALLOT \ instring values
CREATE OUTLIST 512 ALLOT \ outstring values
VARIABLE DEL-FLAG \ deletion flag specified
VARIABLE COM-FLAG \ reverse sense flag specified
VARIABLE SQU-FLAG \ squeeze output string flag
VARIABLE ASC-FLAG \ ascii-mode --> CR dropped on input, added
\ before LF's on output , CONTROL-Z terminates file
VARIABLE ^LIST
VARIABLE LASTCHAR
CTRL M CONSTANT ACR \ Carriage Return
CTRL J CONSTANT ALF \ Line Feed
\ OUTPUT FILE HANDLING ( basically filter.4th )
VARIABLE outhandle stderr outhandle !
VARIABLE outbuffer
VARIABLE outbufptr
VARIABLE outbufend
0 0 IN/OUT
: flushout outbuffer @ outbufptr @ <> IF
outhandle @ outbuffer @ outbufptr @ outbuffer @ - DUP >R write
outbuffer @ outbufptr ! R> <> IF stderr outhandle !
." DISK FULL " flushout 4 RETURN THEN
THEN ;
: EMIT outbufptr @ DUP outbuffer @ BUFSIZ + = IF flushout
DROP outbuffer @ THEN C! 1 outbufptr +! ;
0 0 IN/OUT
: CONSOLE flushout stderr outhandle ! ;
0 0 IN/OUT
: FILTER flushout stdout outhandle ! ;
0 0 IN/OUT : BYE flushout bye ;
0 0 IN/OUT : ABORT flushout 4 RETURN ;
\ INPUT FILE PROCESSING
VARIABLE inbuffer ( pointer to allocated buffer )
VARIABLE inbufptr
VARIABLE inbufend
0 0 IN/OUT
: SETBUFS ( must execute before any I/O to allocate buffers )
129 128 C@ >BUFFER
HERE inbuffer !
BUFSIZ ALLOT
HERE DUP outbuffer ! outbufptr !
BUFSIZ ALLOT
;
\ This version of KEY returns -1 on end of file!
: KEY inbufptr @ inbufend @ = IF ( fetch block )
stdin inbuffer @ BUFSIZ read ?DUP 0= IF ( EOF/ERROR ) -1 EXIT THEN
inbuffer @ + inbufend !
inbuffer @ inbufptr ! THEN
inbufptr @ C@ 1 inbufptr +! ;
\ Commentary
0 0 IN/OUT
: HELLO
." TRANSLATE PROGRAM" CR
." Copyright (C) 1985 by Thomas Almy." CR ;
0 0 IN/OUT
: USAGE
CONSOLE
CR ." [-[A][C][D][S]] str1 [str2]"
CR ." Options are Ascii Complement-str1 Delete Squeeze"
CR ." strings may have \octal or range specifications."
ABORT
;
\ List Accessing
1 0 IN/OUT
: ISLIST ( list -- ) ^LIST ! ;
1 0 IN/OUT
: !LIST ( char -- ) ^LIST @ ! 2 ^LIST +! ;
0 1 IN/OUT
: @LIST ( -- char ) ^LIST @ @ 2 ^LIST +! ;
\ Miscellaneous Subroutines
1 1 IN/OUT
: UPCASE ( char -- char )
DUP [CHAR] a >= IF DUP [CHAR] z <= IF BL - THEN THEN ;
PRIMITIVE
: NEXT-CHAR ( addr len -- addr+1 len-1 char, or zero if end )
DUP IF 1- SWAP COUNT ROT SWAP ELSE FALSE THEN ;
PRIMITIVE
: OCTAL? ( addr len -- addr len boolean )
OVER C@ DUP [CHAR] 0 >= SWAP [CHAR] 7 <= AND ;
: ?BACKSLASH ( addr len char -- addr' len' value )
DUP [CHAR] \ = IF DROP
OCTAL? IF NEXT-CHAR [CHAR] 0 - >R
OCTAL? IF NEXT-CHAR [CHAR] 0 - R> 8 * + >R
OCTAL? IF NEXT-CHAR [CHAR] 0 - R> 8 * + >R
THEN THEN R>
ELSE
NEXT-CHAR
THEN THEN ;
: FILL-LIST ( string length list -- )
ISLIST
BEGIN NEXT-CHAR ?DUP WHILE
DUP [CHAR] - = IF DROP NEXT-CHAR ?BACKSLASH 1+
^LIST @ CELL- @ 1+ DO I !LIST LOOP ELSE
?BACKSLASH !LIST THEN REPEAT
-1 !LIST ( delimit list )
2DROP ;
\ Handle option string
0 0 IN/OUT
: DO-OPTION-STRING
HERE COUNT SWAP 1+ SWAP 1 ?DO
COUNT UPCASE CASE
[CHAR] A OF ASC-FLAG ON ENDOF
[CHAR] D OF DEL-FLAG ON ENDOF
[CHAR] C OF COM-FLAG ON ENDOF
[CHAR] S OF SQU-FLAG ON ENDOF
." UNKNOWN OPTION -- " EMIT USAGE ENDCASE
LOOP
DROP
BL WORD DROP ( scan next word )
;
0 0 IN/OUT
: SET-OPTIONS
ASC-FLAG OFF
DEL-FLAG OFF
COM-FLAG OFF
SQU-FLAG OFF
BL WORD COUNT 0> SWAP C@ [CHAR] - = AND IF ( an option string )
DO-OPTION-STRING
THEN
;
\ Various Table handling routines
1 0 IN/OUT
: SET-SQUTABLE ( hostlist -- ) ISLIST
['] SQTABLE >BODY 256 0 FILL
BEGIN @LIST DUP 0< 0= WHILE
SQTABLE TRUE C<- ( set flag in byte )
REPEAT DROP ;
0 0 IN/OUT
: COMPLEMENT-LIST ( complements INLIST )
INLIST SET-SQUTABLE INLIST ISLIST
256 0 DO I SQTABLE C@ 0= IF I !LIST THEN LOOP
-1 !LIST ;
0 0 IN/OUT
: FILL-TRTABLE ( TRTABLE gets filled from INLIST )
['] TRTABLE >BODY 256 0 FILL
INLIST ISLIST BEGIN @LIST DUP 0< 0= WHILE
TRTABLE TRUE C<- ( set flag in byte )
REPEAT DROP ;
0 0 IN/OUT
: SET-TRTABLE ( TRTABLE is translation table from INLIST to OUTLIST )
256 0 DO I DUP TRTABLE C! LOOP INLIST ISLIST
OUTLIST BEGIN ^LIST @ @ 0< 0= WHILE
DUP @ 0< IF DUP CELL- @ ELSE DUP @ SWAP CELL+ SWAP THEN
@LIST TRTABLE C! REPEAT
DROP ;
\ Information from user?
0 0 IN/OUT
: GET-RANGES
HERE COUNT INLIST FILL-LIST
COM-FLAG @ IF
COMPLEMENT-LIST
THEN
BL WORD COUNT OUTLIST FILL-LIST
SQU-FLAG @ IF
OUTLIST SET-SQUTABLE
THEN
DEL-FLAG @ IF
FILL-TRTABLE
ELSE
SET-TRTABLE
THEN
;
\ Translate functions
PRIMITIVE
: NOT-DELETED? ( key -- key TRUE OR FALSE )
DUP TRTABLE C@ IF DROP FALSE ELSE TRUE THEN ;
1 0 IN/OUT
: SEND-IT SQU-FLAG @ IF
DUP SQTABLE C@ IF
DUP LASTCHAR @ = IF ( a duplicate! )
DROP EXIT THEN THEN
DUP LASTCHAR ! THEN
DUP ALF = IF
ASC-FLAG @ IF
ACR EMIT THEN THEN
EMIT ;
: NEW-KEY? ( -- key TRUE OR FALSE )
ASC-FLAG @ IF
BEGIN KEY DUP ACR = WHILE DROP REPEAT
DUP 0< OVER [CTRL] Z = OR
ELSE
KEY DUP 0<
THEN
IF DROP FALSE ELSE TRUE THEN ;
0 0 IN/OUT
: TRANSLATE
LASTCHAR ON
BEGIN
NEW-KEY?
WHILE
DEL-FLAG @ IF
NOT-DELETED? IF SEND-IT THEN
ELSE
TRTABLE C@ SEND-IT
THEN
REPEAT
;
\ TOP LEVEL
: MAIN
SETBUFS
HELLO
FILTER
SET-OPTIONS
GET-RANGES
TRANSLATE
BYE
;
INCLUDE DOS2
INCLUDE FORTHLIB
END